primal dual interpretation
Review for NeurIPS paper: Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
Additional Feedback: Post rebuttal: The authors addressed my comments. Therefore, I keep my score as'accept' but not higher as I think the clarity of the writing should be improved. When G is nonsmooth and proximable, using proximal maps lead to much faster convergence compared to using subgradients in the optimization case. It is therefore an important problem to investigate the sampling analogue of this scheme which is the topic of this paper. As mentioned, some previous work has been done on this problem, but this paper presents an approach that is most general (in terms of G being supported on a more general set) to date.
Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, i.e., written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (i.e., the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.